{epiprocess} & {epipredict}R packages to ramp up forecasting systemsStanford STATS/BIODS 352 — 12 April 2023
Covid-19 Pandemic required quickly implementing forecasting systems.
Basic processing—outlier detection, reporting issues, geographic granularity—implemented in parallel / error prone
Data revisions complicate evaluation
Simple models often outperformed complicated ones
Custom software not easily adapted / improved by other groups
Hard for public health actors to borrow / customize community techniques
{epiprocess}{epipredict}{epipredict}You can do a limited amount of customization.
We currently provide:
death_rate, 1 week ahead, with 0,7,14 day lags of cases and deaths.lm for estimation. Also create “intervals”.The output is basically ready to submit to COVID-19 ForecastHub
rf <- arx_forecaster(
epi_data = jhu,
outcome = "death_rate",
predictors = c("case_rate", "death_rate", "fb-survey"),
trainer = parsnip::rand_forest(mode = "regression"), # use ranger
args_list = arx_args_list(
ahead = 14, # 2-week horizon
lags = list(c(0:4, 7, 14), c(0, 7, 14), c(0:7, 14)), # bunch of lags
levels = c(0.01, 0.025, 1:19/20, 0.975, 0.99), # 23 ForecastHub quantiles
quantile_by_key = "geo_value" # vary q-forecasts by location
)
){epipredict}A very specialized plug-in to {tidymodels}
# A preprocessing "recipe" that turns raw data into features / response
r <- epi_recipe(jhu) %>%
step_epi_lag(case_rate, lag = c(0, 1, 2, 3, 7, 14)) %>%
step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
step_epi_ahead(death_rate, ahead = 14) %>%
step_epi_naomit()
# A postprocessing routine describing what to do to the predictions
f <- frosting() %>%
layer_predict() %>%
layer_threshold(.pred, lower = 0) %>% # predictions/intervals should be non-negative
layer_add_target_date(target_date = max(jhu$time_value) + 14) %>%
layer_add_forecast_date(forecast_date = max(jhu$time_value))
# Bundle up the preprocessor, training engine, and postprocessor
# We use quantile regression
ewf <- epi_workflow(r, quantile_reg(tau = c(.1, .5, .9)), f)
# Fit it to data (we could fit this to ANY data that has the same format)
trained_ewf <- ewf %>% fit(jhu)
# examines the recipe to determine what we need to make the prediction
latest <- get_test_data(r, jhu)
# we could make predictions using the same model on ANY test data
preds <- trained_ewf %>% predict(new_data = latest)
{epiprocess} & {epipredict} — dajmcdon.github.io/epitooling-stanford-2023